我们设计和分析了量子变压器,扩展了最先进的经典变压器神经网络体系结构,已知在自然语言处理和图像分析中表现出色。在先前用于数据加载和正交神经层的参数化量子电路的工作的基础上,我们引入了三种量子注意机制,包括基于复合矩阵的量子变压器。这些量子体系结构可以使用浅量子电路构建,并可以提供定性不同的分类模型。与最佳的经典变压器和其他经典基准相比,我们对标准医疗图像数据集进行了量子变压器的广泛模拟,这些量子变压器表现出竞争力,有时表现更好。与经典算法相对于分类图像的大小,我们的量子注意层的计算复杂性被证明是有利的。与拥有数百万参数的最佳经典方法相比,我们的量子体系结构具有数千个参数。最后,我们在超导量子计算机上实施了量子变压器,并获得了多达六个量子实验的令人鼓舞的结果。
translated by 谷歌翻译
依靠随机矩阵理论(RMT),本文研究了不对称的秩序与高斯噪声的D $ Spiked张量模型。使用奇异矢量的变分定义和[LIM,2005]的值,我们表明所考虑的模型的分析归结为分析了由此构造的等效尖刺对称\ XYLY矩阵的分析研究的统计{凹陷}与最佳等级-1近似相关的奇异矢量。我们的方法允许确切地表征几乎肯定的渐近奇异值和相应的奇异矢量与真正的尖峰组件的对齐,当$ \ frac {n_i} {\ sum_ {j = 1} ^ d n_j} \ to c_i \中[0,1] $ $ N_I $的张量尺寸。与大多数依赖于统计物理学的工具依赖于统计物理学的其他作品相比,我们的结果仅依赖于古典RMT工具,如Stein的引理。最后,有关尖刺随机矩阵的经典RMT结果被恢复为特定情况。
translated by 谷歌翻译
Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.
translated by 谷歌翻译
Nowadays, the current neural network models of dialogue generation(chatbots) show great promise for generating answers for chatty agents. But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes. Modelling a dialogue's future direction is critical for generating coherent, interesting dialogues, a need that has led traditional NLP dialogue models that rely on reinforcement learning. In this article, we explain how to combine these objectives by using deep reinforcement learning to predict future rewards in chatbot dialogue. The model simulates conversations between two virtual agents, with policy gradient methods used to reward sequences that exhibit three useful conversational characteristics: the flow of informality, coherence, and simplicity of response (related to forward-looking function). We assess our model based on its diversity, length, and complexity with regard to humans. In dialogue simulation, evaluations demonstrated that the proposed model generates more interactive responses and encourages a more sustained successful conversation. This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.
translated by 谷歌翻译
Three-dimensional (3D) technologies have been developing rapidly recent years, and have influenced industrial, medical, cultural, and many other fields. In this paper, we introduce an automatic 3D human head scanning-printing system, which provides a complete pipeline to scan, reconstruct, select, and finally print out physical 3D human heads. To enhance the accuracy of our system, we developed a consumer-grade composite sensor (including a gyroscope, an accelerometer, a digital compass, and a Kinect v2 depth sensor) as our sensing device. This sensing device is then mounted on a robot, which automatically rotates around the human subject with approximate 1-meter radius, to capture the full-view information. The data streams are further processed and fused into a 3D model of the subject using a tablet located on the robot. In addition, an automatic selection method, based on our specific system configurations, is proposed to select the head portion. We evaluated the accuracy of the proposed system by comparing our generated 3D head models, from both standard human head model and real human subjects, with the ones reconstructed from FastSCAN and Cyberware commercial laser scanning systems through computing and visualizing Hausdorff distances. Computational cost is also provided to further assess our proposed system.
translated by 谷歌翻译
We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data into a global implicit surface. We evaluate our method on a publicly available RGB-D SLAM benchmark dataset by Sturm et al. The experimental results show that our proposed reconstruction method solely based on visual odometry and KinectFusion outperforms the state-of-the-art RGB-D SLAM system accuracy. Moreover, our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps.
translated by 谷歌翻译
In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to exclude inaccurate results from sensors, a computational geometry is applied in the occlusion approach, which discovers occluded joint data. The synchronization approach is based on an NTP protocol that synchronizes the time between the clocks of a server and clients dynamically, ensuring that the proposed system is a real-time system. Experiments for validating the proposed system are conducted from the perspective of calibration, occlusion, accuracy, and efficiency. Furthermore, to demonstrate the practical performance of our system, a comparison of previously developed motion capture systems (the linear trilateration approach and the geometric trilateration approach) with the benchmark OptiTrack system is conducted, therein showing that the accuracy of our proposed system is $38.3\%$ and 24.1% better than the two aforementioned trilateration systems, respectively.
translated by 谷歌翻译
With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth.
translated by 谷歌翻译
3D pose estimation is a challenging problem in computer vision. Most of the existing neural-network-based approaches address color or depth images through convolution networks (CNNs). In this paper, we study the task of 3D human pose estimation from depth images. Different from the existing CNN-based human pose estimation method, we propose a deep human pose network for 3D pose estimation by taking the point cloud data as input data to model the surface of complex human structures. We first cast the 3D human pose estimation from 2D depth images to 3D point clouds and directly predict the 3D joint position. Our experiments on two public datasets show that our approach achieves higher accuracy than previous state-of-art methods. The reported results on both ITOP and EVAL datasets demonstrate the effectiveness of our method on the targeted tasks.
translated by 谷歌翻译
To increase the quality of citizens' lives, we designed a personalized smart chair system to recognize sitting behaviors. The system can receive surface pressure data from the designed sensor and provide feedback for guiding the user towards proper sitting postures. We used a liquid state machine and a logistic regression classifier to construct a spiking neural network for classifying 15 sitting postures. To allow this system to read our pressure data into the spiking neurons, we designed an algorithm to encode map-like data into cosine-rank sparsity data. The experimental results consisting of 15 sitting postures from 19 participants show that the prediction precision of our SNN is 88.52%.
translated by 谷歌翻译